Search CORE

How Good Is Multi-Pivot Quicksort?

Author: Aumüller Martin
Dietzfelbinger Martin
Klaue Pascal
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 31/05/2016
Field of study

Multi-Pivot Quicksort refers to variants of classical quicksort where in the partitioning step

k

pivots are used to split the input into

k + 1

segments. For many years, multi-pivot quicksort was regarded as impractical, but in 2009 a 2-pivot approach by Yaroslavskiy, Bentley, and Bloch was chosen as the standard sorting algorithm in Sun's Java 7. In 2014 at ALENEX, Kushagra et al. introduced an even faster algorithm that uses three pivots. This paper studies what possible advantages multi-pivot quicksort might offer in general. The contributions are as follows: Natural comparison-optimal algorithms for multi-pivot quicksort are devised and analyzed. The analysis shows that the benefits of using multiple pivots with respect to the average comparison count are marginal and these strategies are inferior to simpler strategies such as the well known median-of-

k

approach. A substantial part of the partitioning cost is caused by rearranging elements. A rigorous analysis of an algorithm for rearranging elements in the partitioning step is carried out, observing mainly how often array cells are accessed during partitioning. The algorithm behaves best if 3 to 5 pivots are used. Experiments show that this translates into good cache behavior and is closest to predicting observed running times of multi-pivot quicksort algorithms. Finally, it is studied how choosing pivots from a sample affects sorting cost. The study is theoretical in the sense that although the findings motivate design recommendations for multipivot quicksort algorithms that lead to running time improvements over known algorithms in an experimental setting, these improvements are small.Comment: Submitted to a journal, v2: Fixed statement of Gibb's inequality, v3: Revised version, especially improving on the experiments in Section

Simple and Fast BlockQuicksort using Lomuto's Partitioning Scheme

Author: Aumüller Martin
Hass Nikolaj
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 29/10/2018
Field of study

This paper presents simple variants of the BlockQuicksort algorithm described by Edelkamp and Weiss (ESA 2016). The simplification is achieved by using Lomuto's partitioning scheme instead of Hoare's crossing pointer technique to partition the input. To achieve a robust sorting algorithm that works well on many different input types, the paper introduces a novel two-pivot variant of Lomuto's partitioning scheme. A surprisingly simple twist to the generic two-pivot quicksort approach makes the algorithm robust. The paper provides an analysis of the theoretical properties of the proposed algorithms and compares them to their competitors. The analysis shows that Lomuto-based approaches incur a higher average sorting cost than the Hoare-based approach of BlockQuicksort. Moreover, the analysis is particularly useful to reason about pivot choices that suit the two-pivot approach. An extensive experimental study shows that, despite their worse theoretical behavior, the simpler variants perform as well as the original version of BlockQuicksort.Comment: Accepted at ALENEX 201

Benchmarking Nearest Neighbor Search: Influence of Local Intrinsic Dimensionality and Result Diversity in Real-World Datasets

Author: Aumüller Martin
Ceccarello Matteo
Publication venue
Publication date: 01/01/2019
Field of study

Archivio istituzionale della ricerca - Università di Padova

Solving k-Closest Pairs in High-Dimensional Data

Author: Aumüller Martin
Ceccarello Matteo
Publication venue
Publication date: 01/01/2023
Field of study

Reproducibility Companion Paper: Visual Sentiment Analysis for Review Images with Item-Oriented and User-Oriented CNN

Author: Aumüller Martin
Lauw
Nitta
Truong
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

National Research Foundation (NRF) Singapore under NRF Fellowship Programm

Institutional Knowledge at Singapore Management University

ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms

Author: Aumüller Martin
Bernhardsson Erik
Faithfull Alexander
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

This paper describes ANN-Benchmarks, a tool for evaluating the performance of in-memory approximate nearest neighbor algorithms. It provides a standard interface for measuring the performance and quality achieved by nearest neighbor algorithms on different standard data sets. It supports several different ways of integrating

k

-NN algorithms, and its configuration system automatically tests a range of parameter settings for each algorithm. Algorithms are compared with respect to many different (approximate) quality measures, and adding more is easy and fast; the included plotting front-ends can visualise these as images,

\LaTeX

plots, and websites with interactive plots. ANN-Benchmarks aims to provide a constantly updated overview of the current state of the art of

k

-NN algorithms. In the short term, this overview allows users to choose the correct

k

-NN algorithm and parameters for their similarity search task; in the longer term, algorithm designers will be able to use this overview to test and refine automatic parameter tuning. The paper gives an overview of the system, evaluates the results of the benchmark, and points out directions for future work. Interestingly, very different approaches to

k

-NN search yield comparable quality-performance trade-offs. The system is available at http://ann-benchmarks.com .Comment: Full version of the SISAP 2017 conference paper. v2: Updated the abstract to avoid arXiv linking to the wrong UR

Parameter-free Locality Sensitive Hashing for Spherical Range Reporting

Author: Ahle Thomas Dybdahl
Aumüller Martin
Pagh Rasmus
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 20/07/2016
Field of study

We present a data structure for *spherical range reporting* on a point set

S

, i.e., reporting all points in

S

that lie within radius

r

of a given query point

q

. Our solution builds upon the Locality-Sensitive Hashing (LSH) framework of Indyk and Motwani, which represents the asymptotically best solutions to near neighbor problems in high dimensions. While traditional LSH data structures have several parameters whose optimal values depend on the distance distribution from

q

to the points of

S

, our data structure is parameter-free, except for the space usage, which is configurable by the user. Nevertheless, its expected query time basically matches that of an LSH data structure whose parameters have been *optimally chosen for the data and query* in question under the given space constraints. In particular, our data structure provides a smooth trade-off between hard queries (typically addressed by standard LSH) and easy queries such as those where the number of points to report is a constant fraction of

S

, or where almost all points in

S

are far away from the query point. In contrast, known data structures fix LSH parameters based on certain parameters of the input alone. The algorithm has expected query time bounded by

O(t (n/t)^\rho)

, where

t

is the number of points to report and

\rho\in (0,1)

depends on the data distribution and the strength of the LSH family used. We further present a parameter-free way of using multi-probing, for LSH families that support it, and show that for many such families this approach allows us to get expected query time close to

O(n^\rho+t)

, which is the best we can hope to achieve using LSH. The previously best running time in high dimensions was

\Omega(t n^\rho)

. For many data distributions where the intrinsic dimensionality of the point set close to

q

is low, we can give improved upper bounds on the expected query time.Comment: 21 pages, 5 figures, due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract appearing here is slightly shorter than that in the PDF fil